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Optimal Non-coherent Data Detection for 
Massive SIMO Wireless Systems: 

A Polynomial Complexity Solution 

Haider Ali Jasim Alshamary, Md Fahim Anjum, Tareq Al-Naffouri, Alam Zaib, Weiyu Xu 


Abstract —Massive MIMO systems can greatly increase spectral 
and energy efficiency over traditional MIMO systems by exploit¬ 
ing large antenna arrays. However, increasing the number of 
antennas at the base station (BS) makes the uplink noncoherent 
data detection very challenging in massive MIMO systems. In this 
paper we consider the joint maximum likelihood (ML) channel 
estimation and data detection problem for massive SIMO (single 
input multiple output) wireless systems, which is a special case 
of wireless systems with large antenna arrays. We propose exact 
ML non-coherent data detection algorithms for both constant- 
modulus and nonconstant-modulus constellations, with a low 
expected complexity. Despite the large number of unknown 
channel coefficients for massive SIMO systems, we show that the 
expected computational complexity of these algorithms is linear 
in the number of receive antennas and polynomial in channel 
coherence time. Simulation results show the performance gains 
(up to 5 dB improvement) of the optimal non-coherent data 
detection with a low computational complexity. 

Keywords—ML detection, channel estimation, massive SIMO, 
maximum likelihood, sphere decoder 

I. Introduction 

Employing multiple-antenna arrays is well known for its 
benefits: high reliability, high spectral efficiency and interfer¬ 
ence reduction. Recently, a new approach, massive MIMO, 
has emerged by equipping communication terminals with a 
huge number of antennas. This reaps the benefits of traditional 
MIMO systems on a much larger scale. In m, the authors 
mathematically showed that the effect of fast fading and 
non-correlated noise is eliminated as the number of receive 
antennas approaches infinity. This pioneer work has gener¬ 
ated extensive research interests in massive MIMO wireless 
systems. For example, massive MIMO systems’ information- 
theoretic and propagation aspects are discussed in El 0. 
Research on massive MIMO has also focused on many other 
aspects, including transmit and receive schemes, the effect of 
pilot contamination, energy efficiency, and channel estimation 
for massive MIMO systems, as reviewed in aia. 

To achieve the promised advantages of massive MIMO 
systems, knowledge of the channel state information (CSI) is 
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required for performing uplink data detection and downlink 
beamforming ||2|- However, accurately estimating the channel 
coefficients is a grand challenge in wireless systems, especially 
in fast fading environments and massive MIMO system. 
Indeed, allocating pilot symbols to estimate time-varying chan¬ 
nels in multi-cell massive MIMO systems will result in the 
issue of pilot contamination, which is a fundamental limiting 
factor to the performance of massive MIMO systems HUE]. 

Compared with traditional MIMO systems, it is even more 
challenging to perform accurate channel state estimation for 
massive MIMO systems, since massive MIMO systems have a 
large number of unknown channel coefficients. In case of con¬ 
ventional MIMO systems, differential modulation techniques, 
blind and semi-blind, and pilot based algorithms are used 
to solve the problem of channel tracking Ejni- Although 
these algorithms have improved the performance of traditional 
non-coherent MIMO systems, they are not optimized for 
antenna arrays with a large number of time-varying non¬ 
coherent channels, in terms of detection performance and 
complexity. It is of great theoretical and practical interest to 
investigate near-optimal or optimal joint channel estimation 
and data detection schemes for massive MIMO systems 0. 
For example, performing joint channel estimation and data 
detection will help alleviate the pilot contamination issues in 
multi-cell massive MIMO systems 0. 

In conventional MIMO systems, most existing efficient 
non-coherent signal detection algorithms are suboptimal in 
performance, compared with the exact ML non-coherent data 
detection algorithms. However, there are a few exceptions. For 
instance, the sphere decoder algorithm was used in IfT^ and 
ITSl to solve the joint ML non-coherent problem for SIMO 
wireless systems, but only for constant-modulus constellations 
(such as BPSK and QPSK). This sphere decoder reduces 
the computational complexity by restricting the ML detection 
search to a subset of the signal space. In 0, the authors also 
proposed sphere decoder algorithms to achieve the joint ML 
channel estimation and data detection for orthogonal space 
time block coded (OSTBC) wireless systems. In lITSll and 
0, the sphere decoder algorithms were shown to achieve the 
exact ML non-coherent detection performance with a lower 
complexity than that of the exhaustive search. However, the 
sphere decoders proposed in IfT^ and 0 only work for 
constant-modulus constellations. In another line of work, IH 
proposed an exact joint ML channel estimation and signal 
detection algorithm for SIMO systems with general constel¬ 
lations. In Ea, the authors proposed an exact ML channel 
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estimation and data detection for OFDM wireless systems 
with general constellations. In addition, m developed an 
exact ML non-coherent data detection algorithm for OSTBC 
systems with constant-modulus constellations, using recent 
results on efficient maximization of reduced-rank quadratic 
form to achieve polynomial complexity. 

The sphere decoders in 0 [12] [T3l and the ML decoder 
in m work only for constant-modulus constellations. Fur¬ 
thermore, the optimal non-coherent data detection algorithms 
from Ida, a and HI did not look at the non-coherent 
data detection complexity as the number of receive antennas 
grows large in massive SIMO systems. The algorithm in lIThl 
gives an exact ML solution only when the matrix in quadratic 
form optimization has low rank, but this low-rank assumption 
does not hold for SIMO systems with a large number of 
receive antennas. Finding efficient exact ML non-coherent data 
detection algorithms for massive MIMO systems (including 
SIMO systems ifTTl l with general constellations was open ||2l. 

In this paper, we propose joint exact ML channel estimation 
and data detection algorithms for massive SIMO systems, 
which work with both constant-modulus and nonconstant- 
modulus constellations. Firstly, we propose efficient exact ML 
non-coherent data detection algorithms, for both constant- 
modulus and nonconstant-modulus constellations. Secondly, 
we theoretically show that the expected computational com¬ 
plexity is linear in the number of receive antennas and 
polynomial in channel coherence time, which is surprising 
considering a large number of unknown channel coefficients 
in massive SIMO systems. Thirdly, we propose a new ML tree 
search algorithm (TSA) which achieves the exact ML perfor¬ 
mance with near-optimal search complexity. To the best of our 
knowledge, these algorithms are the first set of low-complexity 
joint exact ML non-coherent data detection algorithms for 
massive SIMO systems with general constellations. The only 
other work which provides efficient exact ML non-coherent 
data detection under general constellations is lfT4l . However, 
the method in llT4ll is for traditional SIMO systems with a small 
number of receive antennas, and can not guarantee polynomial 
expected complexity for massive SIMO systems. Moreover, 
our algorithm in this paper is fundamentally different from the 
approach in m. Simulation results demonstrate significant 
performance gains of our optimal non-coherent data detection 
algorithms. As a consequence of this work, we demonstrate 
the exact performance gap between the optimal and suboptimal 
non-coherent data detection algorithms for massive SIMO sys¬ 
tems, under both constant-modulus and nonconstant-modulus 
constellations. 

We remark that, although this paper focuses on discussing 
massive SIMO systems, our proposed algorithms can serve as 
building blocks for performing iterative joint channel estima¬ 
tion and data detection algorithms in general massive MIMO 
systems. This is beyond the scope of this current journal paper, 
and we will leave it as future work. 

The rest of this paper is organized as follows. Section 
nil sets up the system model. Section |III| presents our ML 
non-coherent data detection algorithm for constant-modulus 
constellations. This section also includes the derivation of 
the expected complexity of the proposed exact ML non¬ 


coherent data detection algorithms. Section |IV] presents the 
ML non-coherent data detection algorithm for nonconstant- 
modulus constellations, and derives its complexity. Section lYl 
proposes a new tree search algorithm (TSA) for the exact 
ML non-coherent detection, and derives the complexity of 
the TSA. Simulation results are provided and discussed in 
Section |YI| Section IVIII concludes our paper and highlights 
our contributions. 


11. The Joint Channel Estimation and Signal 
Detection Problem 

Let T denote the length of a data packet during which the 
channel remains constant. The channel output for a SIMO 
system with N receive antennas is given by 

a: = hs* + w, ( 1 ) 

where h e is the SIMO channel vector, s* e is the 
transmitted symbol sequence, and W e is an additive 

noise matrix whose elements are assumed to be i.i.d. complex 
Gaussian random variables. We also assume the entries of s* 
are i.i.d. symbols from a certain modulus constellation O (such 
as BPSK or 16-QAM). 

We assume h as a deterministic unknown channel with 
no priori information known about it Q El- Then, the joint 
ML channel estimation and data detection problem for SIMO 
systems is given by the following mixed optimization problem 

min ||X-hs*|P, (2) 

where denotes the set of T-dimensional signal vectors. 
From II2, the optimization of (O over h is a least square 
problem while the optimization of (O over s* is an integer 
least square problem, since each element of s* is chosen from 
a fixed constellation H. By il, for any given symbol vector 
s*, the channel vector h that minimizes (|2]l is 

h=Xs(s*s)-^ = ATs/llsf, (3) 

Substituting Q into (|2]), we get 

ll^(^- II" = = tr(XX*)-^s*X*Xs. 

||sp ||sp 


= Ps 

(4) 

Now, for the joint ML channel estimation and data detection, 
we need to maximize in (|4]l. This maximization 

depends on whether the constellation of the transmitted signal 
is constant or not. For massive SIMO wireless systems with a 
large number of unknown channel coefficients, we develop al¬ 
gorithms to achieve the exact ML non-coherent data detection 
with low expected complexity, for both constant-modulus and 
nonconstant-modulus constellations. 
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III. Joint ML Channel Estimation and Data 
Detection Algorithm for Constant-modulus 
Constellation 

In this section, we provide the joint ML channel estimation 
and data detection algorithm for constant-modulus constella¬ 
tion. In addition, we will show that the expected complexity of 
this proposed algorithm is polynomial in the channel coherence 
time. 

A. ML Non-coherent algorithm for constant-modulus constel¬ 
lation 

As pointed out in la, if the modulation constellation is 
constant-modulus (such as QPSK), the minimization of (|4|i 
over s* is equivalent to solving the following problem: 

max s*A:*Xs, (5) 

The quadratic form in Q for a constant modulus modulation 
can be changed into an equivalent minimization problem by 
using the maximum eigenvalue of X*X. Thus, Q can be 
represented as 

«S) 


=3 

where p is a slightly larger value than the maximum eigen- 

V* V 

value of jy . One way of solving the integer least square 
optimization problem in (|6]l is by using exhaustive search over 
the entire signal space. However, the computational complexity 
of the exhaustive search is exponential in T. The sphere 
decoder was used in ii to efficiently solve (lU with a lower 
computational complexity than that of the exhaustive search. 
Instead of searching over all the hypotheses, sphere decoder 
proposes to only look at the lattice points within a radius r. 
More specifically, the sphere decoder only examines sequences 
s* satisfying 

s*(pi - < r'^■ (7) 

From the way in which p is determined, the matrix 3 in 
(|6]l is positive semidefinite. Hence, we can use the Cholesky 
decomposition to factorize 3 as 

3 = R*R, (8) 

where i? is a T x T upper triangular matrix. Now using ([8]), 
we can rewrite © as 

X*X 

min -)s = min s*R*Rs 

= min ll-Rsp. (9) 

Since R is an upper triangular matrix, Rs can be expanded as 

T T 

M,. = (10) 

7=1 k=i 


where Ms* is the metric of the transmitted vector s*, and ^ 
is an entry of R in the i-th row and k-th column. For each i 
between 1 and T, we further define 

Ms*,^ = \ZL,^kSk\^ + Ms*^^,^, (11) 

k=i 

where the partial sequence consists of elements s*, 

..., si, Ms* is the metric of the partial sequence and 
Ms* = d%y default. 

T-kltT 

Now we represent the set of possible sequences in a tree 
structure as in ©. In this tree structure, we have T layers, 
and we refer to as a layer-i node in the tree. A tree 
node is the parent node of s*j,. Now we are ready to 

present the algorithm for joint ML channel estimation and 
data detection 18]. 

Joint ML channel estimation data detection algorithm 
Input: radius r, matrix R, constellation H and a 1 x T index 
vector I 

1) Set i = T, n = r, I(i) = 1 and set s* = r2(/(i)). 

2) (Computing the bounds) Compute the metric M^*^. If 
Ms* > r^, go to 3; else, go to 4; 

3) (Ba'clctracking) Find the smallest i < j < T such that 
I(j) < |H|. If there exists such j, set i = j and go to 5; 
else go to 6. 

4) \f i = 1, store current s*, update = M^* and go to 
3; else set i 1, I(i) = 1 and s* = H(/(i)), go to 2. 

5) Set I{i) = I(i) + 1 and s* = Lt(I(i)). Go to 2. 

6) If any sequence s* is ever found in Step 4, output the 
latest stored full-length sequence as the ML solution; 
otherwise, double r and go to 1. 

In our analysis of this algorithm for massive SIMO systems, 
we will slightly change the algorithm in the last step: if no 
sequence is ever found in Step 4, we will increase r to oo. 
We also remark that, for downlink beamforming, one can use 
the h generated from ©, plugging in the s* output from joint 
ML algorithm. 

B. Choice of Radius r 

The choice of the radius r has a big influence on the 
complexity of this ML algorithm. If is chosen bigger than 
the metric of every sequence s e |H|^, the ML algorithm may 
visit all the tree nodes under that radius. If is too small, 
the optimal sequence may have a metric larger than r^, and 
the joint ML algorithm will search again under a new larger 
radius. 

In lUKIS], the authors derived how to choose r such that with 
a certain probability, the transmitted sequence has a metric no 
bigger than r^. However, the choice of radius in ||8] is for a 
fixed number of receive antennas, and for high signal-to-noise 
ratio (SNR). 

In this paper, we quantify the choice of radius r when 
the number of receive antennas is big, as in massive MIMO 
systems. In fact, we set as any constant c such that 

TD 

2 -* *-^rmn 

r = C< 


2 
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where Dmin = min ||si - 52 P is the minimum 

sier2,S2€n,si^S2 

squared distance between two constellations points. 

We remark that this choice of radius is different from that 
in ii. More specifically, the new radius value does not depend 
on the high SNR approximation in IHl, and works for massive 
SIMO systems. In fact, one can choose the radius of r to be 
a positive constant arbitrarily close to 0, for a large SIMO 
system. In the next section, we will show that, under this new 
radius, the joint ML channel estimation and data detection 
algorithm has expected polynomial computational complexity. 


For the average case, we first derive E[X*X], and factorize 
PeI - using the Cholesky decomposition. Using the 

upper triangular matrix generated from the Cholesky decom¬ 
position, we show that the transmitted will be the only 

sequence satisfying Mg* ^ ^ under 3 = peI - • 

In fact, we can write as 

[Xi X2 ■ • Xt] = [s*h S^h ■ • Syh] -l- [Wi W2 ■ • Wy] 

= [s*h-i-Wi S 2 h-i-W 2 • • 


C. Algorithm Computational Complexity 

The computational complexity of the ML noncoherent data 
detection algorithm for SIMO systems is mainly determined 
by the number of visited nodes in each layer. By “visited 
nodes”, we mean the partial sequences for which the 
metric Mg*^ is computed in the algorithm. The fewer the 
visited nodes, the lower computational complexity of the joint 
ML algorithm. In this section, we will show that the number 
of visited nodes in each layer will converge to a constant 
number for a sufficiently large number of receive antennas. 
To simplify complexity analysis, we further modify Step 6 
of the ML algorithm in Section [III “If any sequence s* 
is ever found in Step 4, output the latest stored full-length 
sequence as the ML solution; otherwise, let r = oo and go 
to 1”. We call such a modihed decoder as “modihed sphere 
decoder”. This does not affect the algorithm’s optimality. To 
analyze the computational complexity of our algorithm, we 
further assume the channel vector h has independent zero mean 
unit variance complex Gaussian components. In addition, we 
present our proof for constant-modulus constellations, and, in 
this subsection, without loss of generality, we assume s has 
unit expected energy, i.e., 

|sfcp = l,fc=l,2,...,r. (12) 

Theorem III.l. Let be a positive constant smaller than 
TD^in ^ pi^gfi for the modified sphere decoder in the ML non¬ 
coherent data detection, the expected number of visited points 
at layer i converges to |U| for i < (T - 1), as the number of 
receive antennas N goes to infinity. The sphere decoder only 
visits one tree node at layer i = T. 

Proof of Theorem \IIL1\ The number of visited nodes at 
layer i (1 < i < T - 1) in the joint ML algorithm is equal 
to |U|, if there is one and only one tree node such 

that M-^ < r^. In fact, we will prove that, the transmitted 

®(i+i) T sequence satisfying Mg* ^ ^ < r^, 

with high probability as the number of receive antennas 

->• oo. To prove this, we hrst show this conclusion is true 
for the average case with 3e = PeI- where pE is the 

maximum eigenvalue of ■ Then we use the concentra¬ 
tion results for to prove that, for 3 = pi - , the 

transmitted ®(*+l):T will also be the only sequence satisfying 
Mg* ^ with high probability. 


where Xi is the Lth column vector of X. Then E\X*X^ is 
equal to 


E 


(sjh+wi)* 

(S2h+W2)* 

(s^h+WT)* 


^ (s*h+wi) (s2h+W2) ••• (s^h+wj 


Since the entries of h are independent complex Gaussian 
random variables with unit variance and zero mean, = 

h*ihi\ - After some algebra, we have 


E[X*X]/N 


'sisj-rcr^ S 1 S 2 

S2sJ S 2 S 2 H- 

StsI StS2 


SiS^- 

S2Sy 

sts^ h- (tI_ 

(13) 


We can see that (fOT l is a Hermitian matrix with a full column 
rank. The maximum eigenvalue of is pE = T + 

Now we can write A = peI - ^s 


A = 


T - SiS* -S 1 S 2 

-S 2 S* T-S 2 S 2 


L -stSi 


-StS2 


-SiSf 

-S2Sf 

T — S7^S^_ 


Using the Cholesky decomposition in IT^ . we can decompose 
(peI - into R*R where R is the upper triangular 

matrix of Cholesky decomposition, and can be formed as 


Li,i 

Li,2 

Li,3. • 

Li,t 

0 

L2,2 

L2,3 ■ 

L2,T 

0 

0 

L3,3 ■ 

Ls.t 

0 

0 

0 • 

• Lt,,t 


where Ly., = Lk,iLly, Ltj = “ 

i'L'k=iLk,iLlj)*) for 1 < i < j < T, and is an entry 
of (peI - ) with row index i, and column index j. 

Thus, R is given by (fT4l) (listed on the top of next page). 

We can see that Lu = - 1) - Epl jrXJ -riiiT-j) 

1 < i < T. Now we can use R in (fl4l i as the upper triangular 
matrix of Cholesky decomposition to solve the minimization 
equation in In fact, based on (fTOl i. the metric Mg*^{R) 
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R = 


\/T^ 

0 

0 


-(siSs) 

/t^ 


-(sisD 
/T^ 


0 It 1 ^ T ~ 

U 1 {T-l)(T-2) 


n/T^ 




ife [-(f34) - 


(s3Sy) (s3S*rp)T 

T-1 (T-l)(T-2) 




T-1 


1 

T-1 


(T-(T-2))(T-(T-1))_ 


(14) 


from (|6ll is 


Ms»^ = s*t1s = s*(T/ - ss*)s 
= Ts*s-s*ss*s 

_ rji'T. _ rji'T. 

= 0, (15) 

since s*s = T. Because Mg. = | T,k=i from (fTSl) . 

we must have |Zr=i^4fc = 0 for every 1 < i < T. This, 
in turn, implies that M^* = 0, and Zr=i = 0 for every 

1 < i < T. On the other hand, according to Lemma IIII.2I (the 
proof of which is provided in the appendix), for any other 
's + s, M-g* + 0, where i is the integer closest to T such that 

Lemma III.2. Let s* be the transmitted data sequence. Let 
us consider using peI - for calculating the sequence 

metric. For any such that + s*. Mg* ^ at any 

layer j < i, where i is the largest integer such that s* + s* 

When i = T, the joint ML algorithm will visit only 1 tree 
node, namely whose metric is equal to 0, because is 
predetermined to resolve phase ambiguity; when i < T, at layer 
i, we also only have one sequence's*^ = s*^ such that Mg* = 
0. This will prove Theorem IIII.ll under the assumption tliat 
= E[X*X]. 

Now we proceed to prove that, with high probability, 
X*XlN is close to E[X*X]/N, and thus the expected 
number of visited nodes under pi - is very close to the 

case for peI- ■ In fact, written as the 

average of N independent random variables under considered 
channel model; 


{X*X),^j (s*h + Wi)*(s*h + Wj) 


N 


N 

N 

E (s*hfe + Wfc,i)*(s*hfc + 
fc=i 


N 




3 N 
*N 


N 


s» Z^=l hfcWfcJ ^ T,k=l ^k,fok 


N 


N 


(16) 


where is the Tth column of W. Then we can find the 


expectation and the variance of ( fThl l as follows: 

*i:llE{hlhk) , ZhEiwl^Wk,,) 
E[ - — - J = SiSj — 


N 


N N 

^ ^ Ei^lfok) 


N 

1 + al, if i=j 


N 


SiSj , 


(17) 


var{ 


otherwise 

) = {l + 2al+al)IN. (18) 


, 0.2 , .4 

N 


We provide the proof of ( fTSl l in Appendix ICl 

The weak law of large numbers states that the sample mean 
of a random variable converges to its expectation in probability. 
Thus, for any pair 1 < i,j < N, for any constant ^ > 0 and 
e > 0, as iV ^ oo, we have 

(19) 


N 


N 


This means that, for any ^ > 0 and e > 0, as ^ oo, we 
have 

(20) 

where || • ||_f is the Frobenius norm. 

V* V 

Since p is the maximum eigenvalue of ^ , by the trian¬ 
gular inequality for the spectral norm 


, , „W*X E\X*X], 

\p-Pe\ < II—- .. ||2- 


N 


N 


Since 


a:*a: e{x*x] x*x e[x*x] 


N 


N 


||2 5 = 


N 


N 


IIF, 


we have 


, , E{X*X].. 

\p-fB\<l-^ - 

with probability at least 1 - .J, as TV ->■ oo. 

Using the triangular inequality for the spectral norm and the 
Frobenius norm, we have 

„ ^ ^ E[x*x]^, „ 

\pl - Tt - {peI - — -) II 2 < 2e, 


N 


N 


and 


„ X*A: , ^ E\X*X^,, , r- , 

\\pl -- {peI - — -)If < (Vt + l)e, 

with probability at least 1 - 1 ^, as N 00 . 
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Now since the Cholesky decomposition of {pi - ) is 

continuous at the point A = peI - , for any e > 0 and 

^ > 0, as iV ^ oo, 

\\R — i?||_F < e 

holds true with probability at least 1 - Thus as iV oo, for 
any full-length sequence s'*, with probability at least 1 - 

|m 4 - M|J = r - k:T)s\ < ||sf ||i? - 

which is no bigger than ||'spe. Note here the superscripts R 
and i?'in describe which upper triangular matrix 

is used in calculating the metric. 

Since we can take e to be arbitrarily small, this means that, 
for a small enough e, the number of visited nodes per layer will 
also be equal to |n| under matrix pi - , with probability 

at least (1 - . For a small enough constant e > 0 and any 

constant ^ > 0, as iV ^ oo, the expected number of visited 
nodes at layer i is upper bounded by 

since the largest number of visited nodes at layer i when r = oo 
is Taking arbitrary small ^ > 0, the expected number 

of visited nodes at layer i will approach |n|. ■ 

In summary, we have shown that, under a fixed tr^ or SNR, 
the sphere decoder can achieve an expected complexity of 
polynomial growth. In fact, as stated in Theorem IIII.3I we 
can even lower the SNR requirement for each antenna, while 
still providing the ML non-coherent detection with polynomial 
expected complexity. 

Theorem III.3. Let be a positive constant smaller than 
TDminl‘2'- If = o{\/N), then for the modified sphere 
decoder for the ML non-coherent data detection, the expected 
number of visited points at layer i converges to |n| for 
i < (T - 1), as the number of receive antennas N goes to 
infinity. The sphere decoder only visits one tree node at layer 
i = T. Here o{\/N) means that limjv_»cx) = 0. 

In fact, we can prove Theorem IIII.3I through the same ar¬ 
guments in proving Theorem HIT II by noting that the variance 
var{ ) converges to 0 as ^ oo, if = o{\/N). 

Since we fix the transmission power and the wireless channel 
model, = o(^/]V) means that the SNR per receive antenna 
is allowed to decrease, as long as SNR^/]V ^ oo as iV oo. 
For example, the SNR can scale as 0{\og{\og{N))/\fN) as 
N oo. This implies that we can achieve the ML non-coherent 
detection with low complexity, while increasing the energy 
efficiency of massive SIMO systems. 

IV. Joint ML Channel Estimation and Data 
Detection Algorithm eor Nonconstant-Modulus 
Constellations 

In Section |III1 we introduced joint ML channel estimation 
and data detection algorithm for constant modulus constella¬ 
tions, and analyzed its expected complexity when N ^ oo. 
In this section, we extend our work to nonconstant-modulus 
constellation, and derived its complexity. This paper provides 


the first joint ML channel estimation and data detection algo¬ 
rithm for massive SIMO systems with nonconstant-modulus 
constellations with polynomial expected complexity. 

For nonconstant-modulus constellation, we can change the 
problem of maximizing © to an equivalent minimization 
problem over s* 


min 

s*€n'^ 




( 21 ) 


where, again, p is slightly larger than the value of the max- 
imum eigenvalue of ^ . Now, {pi - jf" ) is a positive 
semidefinite matrix and can be factorized using Cholesky 
decomposition. Then, it can be shown that equation (|2TI) 
can still be successfully transferred into another minimization 
problem 

min (22) 

s 2 


where R is the upper triangular matrix of Cholesky decompo¬ 
sition. 

Since different sequences may have different energy, the 
II sp term in (l22l l prevents us from solving this minimization 
problem through the regular sphere decoder approach. As a 
result, solving (|22]) by directly using the same approach as in 
Section [HI] is invalid for nonconstant modulus constellation. ^ 

In our new algorithm, we will instead lower bound 
for partial sequences Si.,T, taking sequence energy into consid¬ 
eration. To illustrate our new approach, we focus on the 16- 
QAM constellation LI, which comprises 16 points a + bj, where 
a 6 {±1, ±3} and b e {±1, ±3}. Note that in this section, we do 
not assume constellation points of unit energy. The maximum 
energy of a constellation point in 16-QAM is thus 3^ -i-3^ = 18. 

To lower bound , we will divide the sequence s into 
two parts and Sj;t- For any partial sequence s*^, we 

define a new metric, M„» as. 






18(*-l) + ||s*, 


112 ■ 


(23) 


where is the metric defined in (fTTT i. In fact, is 


a lower bound on 
that, for i = 1, 


NL-iP+kfrP "h” 

®l:T |U||2 


. We further notice 


For other types of constellations, we can just replace 18 in 
(l23l l by the maximum energy of a constellation point. 

Following the setup above, we now give the Joint ML 
channel estimation data detection algorithm for nonconstant- 
modulus constellations, using the 16-QAM constellation as 
one example. Even though the problem is not an integer least 
square problem any more, we can still prove the optimality of 
our algorithm under the new metric. 

Joint ML channel estimation data detection algorithm for 
nonconstant-modulus constellations 

Input; radius r, matrix R, constellation LI and a 1 x T index 
vector I 
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1) Set i = T, Vi = r, I{i) = 1 and set s* = 

2) (Computing the bounds) Compute the metric Mg*^- If 
Ms* > go to 3; else, go to 4; 

3) (Baclctracking) Find the smallest i < j < T such that 
I{j) < |r2|. If there exists such j, set i = j and go to 5; 
else go to 6. 

4) If 7 = 1, store current s*, update go to 

3; else set i i -1, /(i) = 1 and s* = go to 2. 

5) Set I(i) = I(i) + 1 and s* = U{I{i)). Go to 2. 

6 ) If any sequence s* is ever found in Step 4, output the 
latest stored full-length sequence as the ML solution; 
otherwise, double r and go to 1. 


Theorem IV.l. The proposed joint ML channel estimation and 
data detection algorithm outputs the correct joint ML sequence 
s*, under nonconstant-modulus constellations, by using the 
new metric in ( I2il ). 

Proof: We note that the algorithm will terminate after 
a finite number of doubling the search radius r. Moreover, 
after the final time of doubling radius r, the radius will not 
increase anymore in the subsequence search. Let s* be the 
final sequence output by the algorithm. We must have, when 
the algorithm terminates, = Mg* ^. Moreover, we can claim 
that any sequence s* other than s* must have a partial sequence 
with metric no smaller than Mg»^; otherwise, the algorithm 
will explore the full length sequence s*, and end up giving a 
final < Mg* , which is a contradiction. 

Thus, for any sequence s* + there must be an i such 
that, for the partial sequence Mg*^ > Mg*^. This implies 
Mg*^ is no smaller than Mg* ^, because Mg*^ is a lower bound 
on Mg* . This proves that indeed s* has the smallest metric 
Mg* 

^l-.T 


A. Choice of Radius r 

For non-coherent massive SIMO systems, we need to pro¬ 
vide an initial search radius which insures low computational 
complexity. For massive SIMO systems adopting 16-QAM, we 
derive the initial search radius as 


r 


2 



(24) 


This radius insures that the optimal solution is inside the search 
radius with high probability. We provide the derivation of this 
radius (namely Lemma IIV.31 l in Appendix iDl We also analyze 
the expected complexity for nonconstant-modulus constella¬ 
tions. In the end, we show that, even for nonconstant-modulus 
constellations, the expected complexity is also polynomial in 
channel coherence length and the number of antennas. This 
analysis will be similar to that of Section IIII-Cl but more 
technically involved. In fact, we show that r can be any 
constant number close to zero for a sufficiently large number 
of receive antennas irrespective of the SNR. 


B. Computational Complexity of ML Algorithm for 
Nonconstant-Modulus Constellations 


Similar to the case of the algorithm for constant-modulus 
constellations, we will show that for massive SIMO systems 
with nonconstant-modulus constellations, as the number of re¬ 
ceive antennas grows to infinity, the expected number of visited 
nodes in each layer will be a constant number, namely |0|. 
Again, to simplify complexity analysis, we further modify Step 
6 of the ML algorithm for nonconstant-modulus constellations: 
“If any sequence s* is ever found in Step 4, output the latest 
stored full-length sequence as the ML solution; otherwise, let 
r = oo and go to 1”. We also further assume the channel vector 
h has independent zero mean unit variance complex Gaussian 
components, and assume that 16-QAM constellation is used. 

Theorem IV.2. Let be a positive constant smaller than 
For nonconstant-modulus constellation massive SIMO system 
with N receive antennas, the expected number of visited points 
by the ML channel estimation and data detection algorithm at 
layer i converges to |f2|/or z < (T- 1), as N ^ oo. The joint 
ML algorithm only visits one tree node at layer i = T. 


Taking the same analysis in Section UlI-Cl we can write the 
maximum eigenvalue of the Hermitian matrix SiS pE = 

ELi INfclP + Then we can represent A = peI - 
as 

T-SisJ^ -S1S2 ••• -Sis/ 

-S 2 S* t-S2S2 ■■■ -S2S/ 


-StSj -StS 2 ••• t-Sys/ 


Where t = Efe=i After decomposing A using Cholesky 

decomposition, we can hnd the entries of R such that R*R. 
Then, we can find an expression to the diagonal entries of the 
R as 


L^,i - 


2-1 


H 


n "^"l' 12 'r/r" II -(25) 

We can hnd the metric of the transmitted signal s* 


as 


s*As s*(tl - ss*)s 
^ = II 112 = 0 ’ 

\m plr 

since s*s = f. As a result, Mg*^ = 0 for any partial sequence 
s*.rp of the transmitted sequence s*.^. On the other hand, ac¬ 
cording to Lemma H V.3 1 (whose proof is given in the appendix), 
for any other signal 's=^ s. Mg* ^ > -^ at any layer j <i, where 
i is the largest integer such that s* +s*.. 


Lemma IV.3. Let s* be the transmitted data sequence. Let 
us consider using PeI - calculating the sequence 

metric. For any s'* such that 's* + s*. Mg* > -^ at any layer 
j < i, where i is the largest integer such that s* i=s*. 

Thus if we set < ^, under the expected matrices, the ML 
non-coherent data detection algorithm will only visit |0| nodes 
in each layer. Following similar concentration arguments for 
the matrix pi - in the proof of Theorem IIII.ll we can 
similarly prove Theorem IIV.2I 

























V. Tree search Algorithm 


In the sections above, we consider each partial sequence 
as a node in a tree structure of T layers. The computational 
complexity of the earlier algorithms heavily depends on how 
the initial search radius r is chosen. Although the search radius 
r is chosen so that the true transmitted sequence is within the 
sphere with high probability, the radius does not guarantee the 
minimum number of visited nodes in the tree search. 

In this section we design a best-first branch-and-bound tree 
search algorithm for ML non-coherent data detection that does 
not need an assigned initial radius r. We call this algorithm 
the Tree Search Algorithm (TSA). In contrast to the algorithm 
in Sections m TSA sets the initial search radius as zero at 
the beginning of the algorithm. Then the radius r in TSA 
systematically increases until the joint ML solution is found. 
This algorithm guarantees to visit no more tree nodes than 
the algorithm in Sections [II] We will show that our previous 
complexity results also upper bound the complexity of TSA. 
Moreover, we prove that this new TSA applies to nonconstant- 
modulus constellations. 

We first introduce several terminologies about the tree struc¬ 
ture we are using. A partial sequence's*^, 1 < i <T, corre¬ 
sponds to a layer-i node in the tree. A node's*^ = (^*,^^+ 1 . 7 ’) 
is called a child node of its parent node ^+ 7 . 7 -. The parent 
node of any layer-T node is called the root node. In a tree, 
any tree node without a child node is called a leaf node. For 
example, in (b) of Figure [T] node 1 is the root node, and node 
2 is the parent node of node 9. 

In the TSA algorithm, we start to construct a tree which has 
only the root node with metric 0. Then in each iteration, the 
TSA always first finds the leaf node with the smallest metric, 
which is called the seed node. Then the algorithm expands 
the tree by adding the seed node’s |r 2 | child nodes to the tree, 
and, moreover, calculates the metrics of all these child nodes. 
The tree search algorithm then iterates this process of finding 
the seed node and expanding the tree, until the selected seed 
node is a layer -1 node, corresponding to a full-length sequence. 
The flow of this algorithm is described as below for constant- 
modulus constellations (for nonconstant-modulus modulations 
we just need to replace M-g*^ by ). 

Tree search algorithm 

Input; matrix R and constellation 17. 

1) Add the root node, and set its metric to 0. Set = 0; 

2) (Find the seed node) Find the leaf node 8 * 7 , which has 
the smallest metric among all the leaf nodes. Select that 
leaf node as the seed node. Update 

3) If the seed node is layer-1 node, namely i = 1, then 
go to 4; else, add the |17| child nodes of to the tree, 
compute the metrics of these child nodes, and go to 2 ; 

4) Terminate the algorithm, output ' 8 ^. 7 , as the optimal 
sequence. Output as the smallest possible metric. 

Figure [T] shows 3 search iterations for QPSK constellation 
and T = 3. The height of a node represents its metric. In (a), 
the root node 1 is selected as the seed node, and expands into 
4 child nodes. Then node 2 is chosen as the seed node, and 
expands into 4 child nodes. The expansion of node 2 is shown 
in (b). The TSA then finds node 5 as the next seed node. The 





Fig. 1. Illustration of tree search algorithm for a tree of 3 layers 

third search iteration in (c) expands node 5 by adding its 4 
children. The TSA algorithm then finds node 9 as the seed 
node since it has the smallest metric. Since node 9 is a layer-3 
node, the algorithm will terminate and output node 9 as the 
ML solution. 

A. Computational Complexity of TSA 

In this section, we will show that the TSA algorithm is 
computationally efficient in terms of the number of visited 
nodes. 

Theorem V.l. The TSA outputs the optimal sequence in joint 
channel estimation and data detection. Let M be the metric 
of the optimal sequence, and let I be the number of sequences 
(including partial sequences) that have metrics no bigger than 
M. Then the number of visited points by TSA is no more than 
(|r2| -I- \)l . Moreover, the TSA algorithm visits no more tree 
nodes than the sphere decoders in Section and |73 

Proof: We first notice that every full-length sequence ^ 
is a direct or indirect child node of a leaf node s*.j, existing at 
the termination of the TSA. However, by the TSA, the metric 
M-g*^ must be no smaller than the final r^. Since M-g*^ is a 
lower bound of Mg* , we have Mg* > at the termination 

1:T 1:T 
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of the TSA. This proves that the TSA indeed outputs the 
optimal sequence, and = M at its termination. 

According to its procedure, the TSA algorithm will not visit 
the child nodes of any node B which has a metric bigger than 
M, namely node B will not be selected a seed node in the 
tree search. In fact, the TSA will add the full-length optimal 
sequence and all its (direct or indirect) parent nodes to the tree 
(because a parent node’s metric is always no bigger than its 
child node’s) even before node B is selected as the seed node. 
The TSA will then declare the full-length optimal sequence as 
the solution, and terminates before node B is ever selected as 
a seed node. So the TSA algorithm can only visit tree nodes 
which have metric no bigger than M, and possibly their direct 
child nodes. This gives an upper bound of (|n| -i- 1)1 on the 
total number of visited tree nodes. 

To find the optimal sequence, the sphere decoder must have 
used a radius r such that r'^ > M. Thus the sphere decoder 
will visit every tree node with metric no bigger than M, and 
its child nodes. So the number of visited nodes by the sphere 
decoder must be no smaller than that of the TSA. ■ 

According to Theorem IV. II the TSA will also visit a 
polynomial number of nodes on average, as N ^ oo. 

VI. Simulation Results 

In this section, we simulate the performance and complexity 
of the exact ML algorithm for SIMO systems with N receive 
antennas, under QPSK and nonconstant-modulus 16-QAM. 
Channel matrix entries are generated as i.i.d complex Gaussian 
random variables. We investigate the performance of the ML 
algorithm for N= 10, 50, 100, and 500 receive antennas. We 
compare the performance of the joint ML non-coherent data 
detection algorithm with sub-optimal iterative and non-iterative 
channel estimation and data detection schemes. We use least 
square (LS) and minimum mean square error (MMSE) channel 
estimation for the iterative and non-iterative detection schemes 
(the reader may refer to lIMIl for the LS and MMSE channel 
estimation). 

In each channel coherent block, we embed one symbol 
which is known by the receiver to resolve channel phase 
ambiguity at layer T of the data sequence. In the non-iterative 
channel estimation scheme, the receiver estimates the channel 
vector using this training symbol. Then, the receiver uses 
this estimated channel vector to detect the remaining T - 1 
transmitted symbols. The iterative suboptimal scheme exploits 
the detected data vector from the pervious iteration to obtain 
a new channel estimation, which, in turn, is used for data 
detection in the current iteration. The iterative joint channel 
estimation and data detection scheme runs 100 iterations for 
each channel coherence block. 

In Eigures |2l [3 @1 and |5] under the QPSK modulation, the 
symbol error rate (SER) of the ML algorithm is evaluated as a 
function of SNR for T = 8 and 20 respectively, along with 
the SER of data detection based on the iterative and non¬ 
iterative LS and MMSE channel estimations. It can be seen 
that the ML algorithm outperforms the LS and MMSE iterative 
and non-iterative channel estimation schemes. Eor example, 
from Eigures |2] and |4] we see more than 2 dB improvement 


over the iterative channel estimation and data detection, and 3 
dB improvement over the non-iterative channel estimation and 
data detection for A^=100, at 10“^ SER. In Figures 0 and |5] 
the ML detector provides a performance improvement of 2 dB 
over the iterative scheme and 4.5 dB improvement over the 
non-iterative scheme, at 10“^ SER. 

We further evaluate the complexities of both sphere decoder 
and the TSA for QPSK constellation by the average number of 
visited nodes in each coherence block. In Figure |6] we obtain 
the average number of visited nodes for T=20 at different SNR 
values. We use our proposed search radius ^ for the 

sphere decoder. It can be seen that when N increases, the 
number of visited nodes significantly decreases. In fact, the 
average number of visited nodes for A^=500 is steady at 76, 
namely the cardinality of the QPSK constellation multiplied by 
(T-1) layers. This is consistent with our theoretical prediction 
in Theorem IIII.ll In addition, the TSA further reduces the 
complexity, compared with the sphere decoder ML algorithm. 
At SNR = -4 dB, our algorithms on average visit only around 
several hundred nodes for N = 50, and only 76 nodes for N = 
500. In comparison, the exhaustive search method will need 
to examine 4^® 2.75 x 10^^ hypotheses for each coherence 

block. Our algorithms achieve complexity reduction in many 
orders of magnitude across a wide range of N. 

Figure |7] describes the performance of ML channel es¬ 
timation and data detection algorithm for the nonconstant- 
modulus 16-QAM constellation. We choose the the coherent 
time T = 12, and N = 50,100 and 500. We can see that our 
novel joint ML algorithms provides nearly 5 dB gain over 
iterative joint MMSE channel estimation and data detection 
algorithms. Under 16-QAM, Figure 0 presents the average 
number of visited nodes, under different SNR values, for 
sphere decoders with ^ and for the TSA. The average 
is taken over 10® channel coherence blocks. Both algorithms 
achieve surprisingly low average computational complexity. 
Note that in order to do exhaustive search, one would need 
to examine 16®^=!.76 x 10®® hypotheses in each coherence 
block. For SNR above -4 dB, on average the TSA visits only 
176 nodes, a 10®®-fold reduction in complexity compared with 
exhaustive search. 

We further extend our SIMO joint ML channel estima¬ 
tion and data detection algorithm to uplink data detection 
in massive MIMO systems with M users. These M users 
employ orthogonal training sequences with length M. First, 
we estimate the channel using M orthogonal training se¬ 
quences. Then, based on MMSE channel estimation from 
training sequences, we use MMSE data detection to decode the 

>lr ^ 

transmitted symbols to S , where S is an matrix of dimension 
M y. T containing M users’data. Next, we use the detected 
signal S to perform MMSE channel estimation again. Now 
for each user j, after subtracting the interference from the other 
(M-1) users using their estimated channels and detected data, 
we perform joint ML channel estimation and data detection 
Q for user j separately. Namely, for user j, the equivalent 
optimization problem is given as follows: 

min ||X, - h,s*||®, 

h..s»enr ® ® 

^ ’ J 
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where Xj = X - Y,^j hiS*, 1 < i,j < M, and and s* are 
estimated channel and detected data for user i respectively. 
After we have detected M users’ data using dD, we will use 
the newly detected data to renew MMSE channel estimation 
for this MIMO system. We perform MMSE MIMO channel 
estimation and SIMO joint channel estimation and data detec¬ 
tion (Ell iteratively for 10 times. 

Eigure |9] shows the performance of this proposed data de¬ 
tection scheme for a massive MIMO system with 4 users, and 
different numbers of receive antennas at the BS. We employ 
QPSK modulation, and assume a channel coherence time 
T=20. We compare our scheme with iterative MMSE channel 
estimation and data detection scheme, and non-iterative MMSE 
channel estimation and data detection. Eor non-iterative chan¬ 
nel estimation and data detection, we will perform one-time 
MMSE data detection based on the MMSE channel estimation 
from training sequences. In iterative MMSE channel estimation 
and data detection, after we get the detected data from MMSE 
data detection, we re-estimate the MIMO channel using both 
training sequences and detected data. This progress is iterated 
for 10 times. Erom Eigure|9] we observe that our algorithm em¬ 
ploying the SIMO joint channel estimation and data detection 
algorithm achieves better performance than iterative MMSE 
channel estimation and data detection. Eor instance, for 7V=50 
and SER=10“^, our SIMO joint channel estimation and data 
detection algorithm has roughly 2 dB gain over non-iterative 
MMSE channel estimation and data detection, and 1 dB gain 
over iterative MMSE channel estimation and data detection 
scheme. Eor A^=100, our SIMO joint channel estimation and 
data detection algorithm has 2 dB gain over non-iterative 
MMSE channel estimation and data detection, and 1.5 dB gain 
over iterative channel estimation and data detection scheme at 
the same SER. 


VII. Conclusions and Euture Work 

To the best of our knowledge, this paper shows, for the 
first time, the performance of joint ML channel estimation 
and data detection algorithm of massive SIMO wireless sys¬ 
tems, for both constant-modulus and nonconstant-modulus 
constellations. We have shown that, as the number of receive 
antennas grows large, the expected complexity of our proposed 
algorithm is polynomial in the channel coherence time, and the 
number of receive antennas. Simulation results show that the 
ML algorithm has better performance than suboptimal non¬ 
coherent data detection schemes. In addition, our simulation 
results verify our theoretical predictions. 

It is very interesting to further explore designing efficient 
joint ML channel estimation and data detection for general 
massive MIMO systems with multiple users or transmit an¬ 
tennas. Such algorithms will be very useful in reducing pilot 
contaminations in general massive MIMO systems. 

Appendix A 
Prooe of Lemma HIO] 

Proof: Eor any "s* + s*, let i be the closest integer to T 
such that s* s^*, where 1 < * < T - 1. Then we can find the 



Fig. 2. SER vs SNR for joint ML channel estimation and data detection, 
iterative and non-iterative LS channel estimation for T = 8 and QPSK. 



Fig. 3. SER vs SNR for joint ML channel estimation and data detection, 
iterative and non-iterative LS channel estimation with T = 20 and QPSK 
modulation. 



Fig. 4. SER vs SNR for joint ML channel estimation and data detection, 
iterative and non-iterative MMSE channel estimation with T = 8 and QPSK 
modulation. 

metric based on (fTTI t 

T 

= I Li + AL^ 

k=i 

T 

— I + Li iSi\ , 

k=i+l 
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Fig. 5. SER vs SNR for joint ML channel estimation and data detection, 
iterative and non-iterative MMSE channel estimation with T = 20 and QPSK 
modulation. 



Fig. 6. Average number of visited points for T = 20 and QPSK modulation. 
Exhaustive search will instead need to examine 2.75 x 10^^ hypotheses. 



Fig. 7. SER vs SNR, for joint ML channel estimation and data detection 
and iterative MMSE channel estimation with T = 12 and 16-QAM. 



Fig. 8. Average number of visited points, T=12 with 16-QAM. Exhaustive 
search will instead need to examine 1.76 x 10^^ hypotheses. 



Fig. 9. SER vs SNR for joint ML channel estimation and data detection, 
iterative and non-iterative MMSE of MIMO wireless system, T = 14 and 
M = 4 


where we have used the fact that ZLi = 0, as shown in 

the proof of Theorem IIILII Since 'Si - Sj + 0 by assumption, 
and Li^i + 0 for i + T according to Lemma lB?n will 

not be zero either. 

When s'* + s*, M-g* is thus lower bounded by 

\Li^i(si - Si)\^, i < T. The smallest possible value for 
“ Sj)p is given by i = T - 1 (see Lemma [6.11 1 and 
|('Si - Si)p = mins,,s2en.si+s2 Pi - S2p- 


Appendix B 

Lemma IbTH and its prooe 


where and = M^* = 0 as proved in 

Theorem lIII.il Now we can write ([TOl t as 

T 

k=i 


= \Li^i{si - Sj)| , 


Lemma B.l. ^ > \fTj2 for any 1 <i <T - 1, and Lt,t is 
equal to zero. 
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Proof: 

Lii = 






T 


ri (r-(j-i))(r-j) 




(r-i) + 


1-1 / 

A 


T 


(T-(j-l)) (T 




r 


T 


When i = T, 


will be 


-^r,i ^ Y ^ 
= 0 . 


T 


T-(T-l) 


We can also see that Li^i> 'y/T/2 for any i < T, taking equality 
when i = T -1. ■ 

Appendix C 

Derivation oe var[(X*X)ij/N] in (flSl) 

Proof: 

var[{X* X)ij] 

N N 

= var[Y, Bk] = Y, var{Bk) 

k=l k=l 

= Y{E[BkBl]-E[Bk]E[Bl]) 

k=l 

where Bk = (s*hfc + +MVkj). By expansion, we 

have 

E{BkBl] = SiS*s*Sj h^hfchfchfc + 

= 1 

+ SiS*s*hlhkhkwlj+SiS*SjhlhkWk,ihl. 


write hfe as a + 6^/^, where a and b are independent, and 
both follow Gaussian distribution A/'(0,i). Thus i?[h^] = 
i?[(h^)^] = 0. Furthermore, 

F;[|hfc|^] = E[{a'^ + 6^)2] = + b‘^ + 2a^b^] 

= 3at + 3at + 2alal 

= 2x3x(i)2 + ^ = 2, (26) 

where cr^ = | and cr^ = i are respectively the variance of a 
and b. In the same way, we can find i?[|w|^] = 2 (t^. 

Thus, when i + j, 

E[BkBl] = E[\hk\‘^] + F;[|wfc.,p]F;[|wfc,,f ] 

+ £;[|hfep]F;[|wfc.,p] + E[\hkf]E[\v^k,jf] 

= 2 + at+2al. (27) 


When i = j, 

E[BkBl]=E[\hkf] + E[\i^kA‘'] 

=2 =2a^ 

+ F;[|hfcp]F;[|wfc.,p] + £;[|hfep]£;[K,,p] 


+ E[\hkf]E[\wkA"] +s? E[{hl)AE[{v^k,^)A 


+ {s^yE[ihk)AE[{^Yf]+E[\hk\AE[\wkA^] 


=0 


= 2 + 2CT:,+4a,i 


(28) 


Moreover, after some algebra. 


=0 


=0 


+ SjS*W^ i^kj'^k j'^k,i 

+ .WfejhfeWfcj +SjWfc 


=0 


+ SiSjS^hlwkjhlhk + Sihlwk,j^l,j^k,i 


=0 


+ SiS* hlwk,jhkv/l j + SiSjhlwkjWk,i^l 

=1 

+ sjsjS*Wfc_jh/chfchfe + s*’Wfc j 


=0 =0 
s*s*w^ + s*Sj w^_jhfcWfe_ih^ 


=1 


Since we already assume that the entries of h are rotationally- 
invariant complex Gaussian with unit variance, then we can 


E[Bk]E[Bl] = 


l + 2ai+al, if i = j 
SiSjSjS* = 1, otherwise. 


Finally, 


var{Bk) = E[BkB*]-E[Bk]E[B*] 


l + 2al+ai, if i = j 
1 + 2cr^ + cr^, Otherwise 


This leads to 


r. ,o_2, 4. 


N 


-)-(! + 2a^ + cr^)IN. 


Appendix D 
Proof of Lemma ITvj] 

Proof: Let us recall that t = yJLi ||s*p. 


(29) 


( 30 ) 
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t - lls* 




l|2ik*||2 


^-||sL-il 




i(l-. 


= * l|2- 


(31) 


=*:Tll 


We can see that for any i + T, 
However, when i = T, 


+ 1 and thus La + Q. 


Lt t — 


A 


t{l-- 


r) = 0. 




For any s* such that s* + s*, let i be the largest integer such 
that s* + 's*. Then for any j < i, 


Ms* > 


Lj 


"jlTlI 


+ 18(j -1) 


We would like to give a lower bound on the right side of the 
equation above. We first lower bound Lil^ = t{l ~ |p^)- Th^ 
smallest possible value for t is t = 2T (achieved when every 
symbol is in the form of ±l±j ), and the largest possible value 
for ||g ^‘|^||2 is t = T - 1, llsr-ip = 18, and ||stP = 2. Thus 
- is lower bounded by 2T(1 - y|^) = T/5. Furthermore, 
the smallest possible value for US'* - s* p = 4, and the largest 
possible value for ||s*,j,p + 18(j-l) is 18T. This in turn gives 
a lower bound of 4 x (r/5)/(18T) = 2/45. ■ 
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